The Supermen
Fictional Characters in DBpedia, Freebase and other Generic Databases
This is based on a response to a person who was looking at a record for the D.C. comics character named Superman in :BaseKB, which is derived from Freebase
What I see in Freebase right now (June 2014)
https://www.freebase.com/m/070vn#/award/ranked_item
doesn't contain anything that strikes me as wrong, but there is a
split discussion attached to it and it's a little fishy that the topic
was created on June 4 2013. If I look at it with the gold copy of :BaseKB that dated March 2014,
https://aws.amazon.com/marketplace/pp/B00KDO5IFA
and do this query with the sql
command
SQL> sparql select ?type { <http://rdf.basekb.com/ns/m.070vn> a ?type .};
type
LONG VARCHAR
_______________________________________________________________________________
http://rdf.basekb.com/ns/theater.theater_character
http://rdf.basekb.com/ns/user.geektastique.superheroes.topic
http://rdf.basekb.com/ns/base.zxspectrum.topic
http://rdf.basekb.com/ns/common.topic
http://rdf.basekb.com/ns/award.ranked_item
http://rdf.basekb.com/ns/base.ontologies.ontology_instance
http://rdf.basekb.com/ns/base.tagit.concept
http://rdf.basekb.com/ns/book.book_character
http://rdf.basekb.com/ns/comic_books.comic_book_character
http://rdf.basekb.com/ns/fictional_universe.fictional_character
http://rdf.basekb.com/ns/film.film_character
http://rdf.basekb.com/ns/film.film_subject
http://rdf.basekb.com/ns/tv.tv_character
http://rdf.basekb.com/ns/base.fictionaluniverse.topic
http://rdf.basekb.com/ns/cvg.game_character
http://rdf.basekb.com/ns/user.duck1123.default_domain.primary_identity
http://rdf.basekb.com/ns/user.duck1123.default_domain.adopted_character
http://rdf.basekb.com/ns/amusement_parks.ride_theme
http://rdf.basekb.com/ns/base.fictionaluniverse.cloned_character
http://rdf.basekb.com/ns/user.geektastique.superheroes.superhero
http://rdf.basekb.com/ns/user.jschell.default_domain.alter_ego
and I don't see anything that's obviously wrong there.
Types without Hierarchy
Types in Freebase typically mean that something plays a role. For instance, superman is a :film.film_subject
because he the subject of a film. He is an :amusement_parks.ride_theme
because amusement parks have been made about him. There's nothing contradictory about this, at least to first order, because these types don't fit into a hierarchy.
This is similar to what people call this a 'duck type' in some programming languages, and this is the way an RDFS reasoner thinks. If we define
:film.film.subjects a rdfs:Property .
:film.film.subjects rdfs:domain :film.film .
:film.film.subjects rdfs:range :film.film_subject .
and tell the reasoner that
:m.01_mdl :film.film.subjects :m.070vn .
it infers that
:m.01_mdl a :film.film .
:m.070vn a :film.film_subject.
It's liberating to not have types in a strict hierarchy. You'll hear people say
:Person rdfs:subClassOf :Animal .
but lawyers will tell you that
:Corporation a :Person .
this is a contradiction, of course. The problem isn't any of the statements, it's the fact that we're using :Person
to mean two different things. That is, a member of the species homo sapiens vs an entity that can be a party to a contract.
The Foaf vocabulary partially resolves this problem by creating the foaf:Agent
concept such that
foaf:Person rdfs:subClassOf foaf:Agent .
foaf:Organization rdfs:subClassOf foaf:Agent .
if we know
m.070vn a foaf:Person .
the system infers
m.070vn a foaf:Agent .
By the language of the standard, ["Something is a Person if it is a person. We don't nitpic about whether they're alive, dead, real, or imaginary."(http://xmlns.com/foaf/spec/#term_Person) Though he's not a member of Homo Sapiens
, he looks like a person, talks like a person and flies faster than a speeding plane so I guess he's a person.
Practically, you could map the Freebase :people.person
to foaf:Person
and map :business.employer
to foaf:Organization
and feel comfortable. How you map things from there is more subjective. If you don't want your system to call Jack Bauer for help, you don't have to map fictional characters to foaf:Person
. It's a choice you make based on what you want your system to think.
Splitting hairs
In databases such as DBpedia, Freebase and Wikidata, concepts like
"Superman" get overloaded. The trouble is that they take multiple
forms; for instance Superman the character might have started in a
comic book, but he has been in movies and TV shows and been the
subject of pinball games, amusement park rides, video games, etc.
So when you say there are multiple topics with the same id, you are
right. Some people split topics finer than others do.
Worse than that, since Superman has been around a long time there
have been many different versions of him in the comic books. After
the 1980's "Crisis of infinite worlds", Superman is officially the
"last Kryptonian", the only survivor of Krypton's explosion. Before
then there was Krypto, General Zod, and Supergirl but they all got
wiped out, or sorta-kinda wiped out in the case of
http://en.wikipedia.org/wiki/Superman_prime
http://en.wikipedia.org/wiki/Supergirl_(Matrix)
Marvel is almost as bad, to the point where it is hard to make
statements about a subject like "Iron Man"; for instance, the
original Iron Man kept his identity secret from almost everybody,
including Pepper Potts, He's completely open about it in the recent
movies and comics. The Hulk has usually been named "Bruce Banner"
except on the 1970s TV show where he was named "David Banners".
Unless you split "Iron Man" and "The Hulk" into separate characters,
you can't make statements about the most basic facts about them.
Will the real Star Trek Stand up?
You run into similar problems with "Star Trek", "Sailor Moon",
"Halo" and other media franchises.
If you need a finer grained description of a domain like this, you
could try to build it into Freebase or DBpedia through the community
process or you can create your own database. This starts with writing
a better schema, but there's the challenge that people might have a
hard time populating that schema or using it. I'd imagine a crack
ontologist who's obsessed with comic books would probably define 50 or
100 "Supermen" to model the illustrious history of what most people
think of the one and only "Superman"
We're only human
There's a tension, however, between databases that are precise versus databases that can be maintained by a community. What's in Wikipedia, for instance, is controlled by a battle between inclusionists and exclusionists over what is "notable" enough to be in Wikipedia. Star Trek is notable enough that each episode has its own page, yet there are no individual pages for the 13,088 episodes of General Hospital. Although Wikis dedicated to fictional words are encouraged on Wikia, detailed coverage of fictional worlds will always get pushback from deletionists in Wikipedia.
We can have simple databases that everyone can contribute too, or more complex databases that require you to be an ontologist and a comic fan at the same time. It would be nice to see something though that's to Wikia what Freebase and DBpedia are to Wikipedia.
Creator of database animals and bayesian brains